Skip to content

Conversation

@tiedotguy
Copy link
Collaborator

We have two general code paths:

  1. Merge two Tags in to one
  2. Merge two Tags in to one, excluding certain tags

We used to handle number 1 by passing an empty map to number 2, however these can be seen as fast (add static tags) and slow (add static tags + perform filtering) code-paths.

The fast code path is now optimized by performing an array search instead of using a map to track things that have been seen. This is more efficient for small values. There is diminishing returns eventually, but that's with a high number of tags, which is an anti-pattern.

The primary benefit is in removing the map allocation.

The slow code path is unchanged, but could be improved by ensuring the map is sufficiently large, or using the same linear search method to check if a tag should be excluded.

Note: this benchmark isn't perfect because we clone the source arrays, but that can be identified in the profile.

BenchmarkUniqueTagsPractical/original-22                40025511               300.1 ns/op           304 B/op          3 allocs/op
BenchmarkUniqueTagsPractical/prealloc-22                38560572               308.3 ns/op           304 B/op          3 allocs/op
BenchmarkUniqueTagsPractical/array-search-22            56932317               210.9 ns/op           304 B/op          3 allocs/op

We have two general code paths:

1. Merge two Tags in to one
2. Merge two Tags in to one, excluding certain tags

We used to handle number 1 by passing an empty map to number 2, however these
can be seen as fast (add static tags) and slow (add static tags + perform
filtering) code-paths.

The fast code path is now optimized by performing an array search instead of
using a map to track things that have been seen.  This is more efficient for
small values.  There is diminishing returns eventually, but that's with a high
number of tags, which is an anti-pattern.

The primary benefit is in removing the map allocation.

The slow code path is unchanged, but could be improved by ensuring the map
is sufficiently large, or using the same linear search method to check if
a tag should be excluded.

Note: this benchmark isn't perfect because we clone the source arrays, but that
can be identified in the profile.

BenchmarkUniqueTagsPractical/original-22                40025511               300.1 ns/op           304 B/op          3 allocs/op
BenchmarkUniqueTagsPractical/prealloc-22                38560572               308.3 ns/op           304 B/op          3 allocs/op
BenchmarkUniqueTagsPractical/array-search-22            56932317               210.9 ns/op           304 B/op          3 allocs/op
@tiedotguy tiedotguy requested review from akavatl, hstan and irisgve July 2, 2025 21:26
b.Run(name, func(b *testing.B) {
b.ReportAllocs()
for b.Loop() {
_ = f(slices.Clone(t1), slices.Clone(t2))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically we don't need to clone t2, but I had the code written...

Copy link
Member

@hstan hstan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems I discovered a bug when trying to run a unit test to understand how the uniqueTagsSimple works:
with input:

{
	name:     "multiple duplicates",
	t1:       gostatsd.Tags{"a", "b", "a", "c", "b"},
	t2:       gostatsd.Tags{"a", "b", "c"},
	expected: gostatsd.Tags{"a", "b", "c"},
},

the output of uniqueTagsSimple(t1, t2) is gostatsd.Tags{"a", "b", "b", "c"} instead of gostatsd.Tags{"a", "b", "c"}


last := len(t1)
for idx := 1; idx < last; { // start at 1 because we know the first item will be unique.
if slices.Contains(t1[:idx-1], t1[idx]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should here be t1[:idx] to reach the last element? since the loop condition is idx < len(t1)?

(see my other comment for a failed test case)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants